true cumulative reward
Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.41)
ModelFail was first introduced by Thomas and Brunskill [2016] to show the failure of model-based approach in the
We would like to thank the reviewers for appreciating our novel contributions on the algorithmic and theoretical front! We focus on clarifying our experimental results in this rebuttal. Please refer Section 5.1 (line 258-262, there is a typo in Line 262, " stands for "unobserved", is an observed variable that the policy needs to react upon). Also see Section C (line 567-575) in the supplement for more details. The time-invariant ModelWin and MountainCar we used in the paper are finite-horizon undiscounted MDPs.